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GPGPU: g eneral pur p ose computation on gra phics hardware 

David Luebke, Mark Harris, Jens Kruger, Tim Purcell, Naga Govindaraju, Ian Buck, Cliff 

Woolley, Aaron Lefohn 

August 2004 ACM SIGGRAPH 2004 Course Notes SIGGRAPH '04 
Publisher: ACM Press 

Full text available: *g]pdf(63.Q3 MB) Additional Information: full cita tion, abstract , citings 

The graphics processor (GPU) on today's commodity video cards has evolved into an 
extremely powerful and flexible processor. The latest graphics architectures provide 
tremendous memory bandwidth and computational horsepower, with fully programmable 
vertex and pixel processing units that support vector operations up to full IEEE floating 
point precision. High level languages have emerged for graphics hardware, making this 
computational power accessible. Architecturally, GPUs are highly parallel s ... 



External memory algorithms and data structures: dealing with massive data 
Jeffrey Scott Vitter 

June 2001 ACM Computing Surveys (CSUR), Volume 33 issue 2 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings, index 
terms 



Full text available: fgj pdf(828.46 KB) 



Data sets in large applications are often too massive to fit completely inside the 
computers internal memory. The resulting input/output communication (or I/O) between 
fast internal memory and slower external memory (such as disks) can be a major 
performance bottleneck. In this article we survey the state of the art in the design and 
analysis of external memory (or EM) algorithms and data structures, where the goal is to 
exploit locality in order to reduce the I/O costs. We consider a varie ... 

Keywords: B-tree, I/O, batched, block, disk, dynamic, extendible hashing, external 
memory, hierarchical memory, multidimensional access methods, multilevel memory, 
online, out-of-core, secondary storage, sorting 
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Ning An, Sudhanva Gurumurthi, Anand Sivasubramaniam, Narayanan Vijaykrishnan, Mahmut 
Kandemir, Mary Jane Irwin 

November 2002 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume 11 Issue 3 
Publisher: Springer-Verlag New York, Inc. 

Full text available: *g)pdf(641.55 KB) Additional Information: full citation , abstract , citing s, index terms 

The proliferation of mobile and pervasive computing devices has brought energy 
constraints into the limelight. Energy-conscious design is important at all levels of system 
architecture, and the software has a key role to play in conserving battery energy on 
these devices. With the increasing popularity of spatial database applications, and their 
anticipated deployment on mobile devices (such as road atlases and GPS-based 
applications), it is critical to examine the energy implications of spatial ... 

Keywords: Energy optimization, Multidimensional indexing, Resource-constrained 
computing, Spatial data 



Improvin g instruction cache performance in OLTP 
Stavros Harizopoulos, Anastassia Ailamaki 

September 2006 ACM Transactions on Database Systems (TODS), volume 31 issue 3 
Publisher: ACM Press 

Full text available: 'jg pdf(783.16 KB) Additional Information: full citation , abstract , references , index terms 

Instruction-cache misses account for up to 40&percnt; of execution time in online 
transaction processing (OLTP) database workloads. In contrast to data cache misses, 
instruction misses cannot be overlapped with out-of-order execution. Chip design 
limitations do not allow increases in the size or associativity of instruction caches that 
would help reduce misses. On the contrary, the effective instruction cache size is 
expected to further decrease with the adoption of multicore and multithreading ... 

Keywords: Instruction cache, cache misses 
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Ax Stevan Vlaovic, Edward S. Davidson 

June 2002 Proceedings of the 16th international conference on Supercomputing ICS 
•02 

Publisher: ACM Press 

Full text available: 'jg pdfd 79.52 KB) Additional Information: full citation , abstract , references , index terms 

Trace caches are used to help dynamic branch prediction make multiple predications in a 
cycle by embedding some of the predictions in the trace. In this work, we evaluate a trace 
cache that is capable of delivering a trace consisting of a variable number of instructions 
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via a linked list mechanism. We evaluate several schemes in the context of an x86 
processor model that stores decoded instructions. By developing a new classification for 
trace cache accesses, we are able to target those misses t ... 

Keywords: branch prediction, optimization, trace cache, x86 



8 Cache Refill/Access Decouplin g for Vector Machines 
Christopher Batten, Ronny Krashinsky, Steve Gerding, Krste Asanovic 

December 2004 Proceedings of the 37th annual IEEE/ACM International Symposium 

on Microarchitecture MICRO 37 
Publisher: IEEE Computer Society 

Full text available: t g] pdf( 319.32 KB ) Additional Information: full citation , abstract , citing s 

Vector processors often use a cache to exploit temporal locality and reduce memory 
bandwidth demands, but then require expensive logic to track large numbers of 
outstanding cache misses to sustain peak bandwidth from memory. We present 
refill/access decoupling, which augments the vector processor with a Vector Refill Unit 
(VRU) to quickly pre-execute vector memory commands and issue any needed cache line . 
refills ahead of regular execution. The VRU reduces costs by eliminating much of the 
outstan ... 

9 Fast detection of communication patterns i n d istr ibuted executions 
Thomas Kunz, Michiel F. H. Seuren 

November 1997 Proceedings of the 1997 conference of the Centre for Advanced 
Studies on Collaborative research CASCON '97 

Publisher: IBM Press 

Full text available: *g| pdf(4.21 MB ) Additional Information: full citation , abstract , references , index terms 

Understanding distributed applications is a tedious and difficult task. Visualizations based 
on process-time diagrams are often used to obtain a better understanding of the 
execution of the application. The visualization tool we use is Poet, an event tracer 
developed at the University of Waterloo. However, these diagrams are often very complex 
and do not provide the user with the desired overview of the application. In our 
experience, such tools display repeated occurrences of non-trivial commun ... ■ 

10 Inverted files for text search en g ines 
A. Justin Zobel, Alistair Moffat 

N* 7 July 2006 ACM Computing Surveys (CSUR), volume 38 issue 2 
Publisher: ACM Press 

Full text available: t g) pdf(944.29 KB) Additional Information: full citation , abstract , references , index terms 

The technology underlying text search engines has advanced dramatically in the past 
decade. The development of a family of new index representations has led to a wide 
range of innovations in index storage, index construction, and query evaluation. While 
some of these developments have been consolidated in textbooks, many specific 
techniques are not widely known or the textbook descriptions are out of date. In this 
tutorial, we introduce the key techniques in the area, describing both a core impl ... 

Keywords: Inverted file indexing, Web search engine, document database, information 
retrieval, text retrieval 
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Publisher: ACM Press 

Full text available' I g) pdf(385 22 KB) Add ' tional Information: full citation , abstract , references , citings, index 

This tutorial surveys design methods for energy-efficient system-level design. We 
consider electronic sytems consisting of a hardware platform and software layers. We 
consider the three major constituents of hardware that consume energy, namely 
computation, communication, and storage units, and we review methods of reducing their 
energy consumption. We also study models for analyzing the energy cost of software, and 
methods for energy-efficient software design and compilation. This survery ... 

12 Exploiting perception in high-fidelity virtual environments: Exploitin g perception in 
high-fidelity virtual environments 

Additional presentations from the 24th co urse are available on the citation 
page 

Mashhuda Glencross, Alan G. Chalmers, Ming C. Lin, Miguel A. Otaduy, Diego Gutierrez 
July 2006 ACM SIGGRAPH 2006 Courses SIGGRAPH 06 

Publisher: ACM Press 

Full text available: ^(S.OTMB).® Addjtjona| |nformatjon; fuii citation , abstra ct , references 
mov(68:6 MIN ) 

The objective of this course is to provide an introduction to the issues that must be 
considered when building high-fidelity 3D engaging shared virtual environments. The 
principles of human perception guide important development of algorithms and 
techniques in collaboration, graphical, auditory, and haptic rendering. We aim to show 
how human perception is exploited to achieve realism in high fidelity environments within 
the constraints of available finite computational resources. In this course w ... 

Keywords: collaborative environments, haptics, high-fidelity rendering, human-computer 
interaction, multi-user, networked applications, perception, virtual reality 



1 ^ O ptimizin g instruction cache performance of embedded systems 
>£x S. Bartolini, C. A. Prete 

November 2005 ACM Transactions on Embedded Computing Systems (TECS), volume 4 

Issue 4 

Publisher: ACM Press 

Full text available' "Fl pdf(81 7 74 KB) Add ' tional Information: full citation , abstract, references , index terms . 

review 

In the embedded domain, the gap between memory and processor performance and the 
increase in application complexity need to be supported without wasting precious system 
resources: die size, power, etc. For these reasons, effective exploitation of small and 
simple cache memories is of the utmost importance. However, programs running on such 
caches can experience serious inefficiencies due to cache conflicts. We present a new 
Cache-Aware Code Allocation Technique (CAT), which transforms the structu ... 

Keywords: Embedded systems, cache performance, code generation, code reordering, 
conflict miss 
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M. Kandemir, J. Ramanujam, U. Sezer 

January 2006 ACM Transactions on Design Automation of Electronic Systems 

(TODAES), Volume 11 Issue 1 
Publisher: ACM Press 

Full text available: gpdf (1.08 MB) Additional Information: full citation , abstract , references , index terms 
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On-chip caches consume a significant fraction of the energy in current microprocessors. 
As a result, architectural/circuit-level techniques such as block buffering and sub-banking 
have been proposed and shown to be very effective in reducing the energy consumption 
of on-chip caches. While there has been some work on evaluating the energy and 
performance impact of different block buffering schemes, we are not aware of software 
solutions to take advantage of on-chip cache block buffers.This articl ... 

Keywords: Energy optimizations, block buffering, compiler transformations, data cache, 
embedded systems 
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David A. Patterson 

September 1988 ACM SIGARCH Computer Architecture News, volume 16 issue 4 
Publisher: ACM Press 
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16 4.2BSD and 4.3BSD as examples of the UNIX system 
John S. Quarterman, Abraham Silberschatz, James L Peterson 
December 1985 ACM Computing Surveys (CSUR), Volume 17 issue 4 

Publisher: ACM Press 

Full text available* l P1 df(4 07 MB) Additional Information: full citation , abstract , references , citin gs, index 
™" terms , review 

This paper presents an in-depth examination of the 4.2 Berkeley Software Distribution, 
Virtual VAX-11 Version (4.2BSD), which is a version of the UNIX Time-Sharing System. 
There are notes throughout on 4.3BSD, the forthcoming system from the University of 
California at Berkeley. We trace the historical development of the UNIX system from its 
conception in 1969 until today, and describe the design principles that have guided this 
development. We then present the internaldata structures and ... 

1 7 Po wer reduction techniques for microprocessor syst ems 
ygv Vasanth Venkatachalam, Michael Franz 

>^ September 2005 ACM Computing Surveys (CSUR), volume 37 issue 3 
Publisher: ACM Press 

Full text available: < g?| pdf (602.33 KB ) Additional Information: full citation , abstract , references , index terms 

Power consumption is a major factor that limits the performance of computers. We survey 
the "state of the art" in techniques that reduce the total power consumed by a 
microprocessor system over time. These techniques are applied at various levels ranging 
from circuits to architectures, architectures to system software, and system software to 
applications. They also include holistic approaches that will become more important over 
the next decade. We conclude that power management is a ... 

Keywords: Energy dissipation, power reduction 
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Publisher: ACM Press 

Full text available: t g] pdf( 398.75 KB ) Additional Information: full citation , abstract, references , index terms 




http://portal.acm.org/results.cfm?CFID=l 50941 53&CFTOKEN=55539927&adv=l&COL... 3/30/2007 



Results (page 1): +cache +set +search +sequence +cycle +way ^accessed segment line bl... Page 6 of 6 



In this paper, we propose an approach to estimate the worst-case response time (WCRT) 
of each task in a preemptive multitasking single-processor real-time system utilizing an 
LI cache. The approach combines intertask cache-eviction analysis and intratask cache- 
access analysis to estimate the number of cache lines that can possibly be evicted by the 
preempting task and also be accessed again by the preempted task after preemptions 
(thus requiring the preempted task to reload the cache line(s)). T ... 
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Level one cache normally resides on a processor's critical path, which determines the 
clock frequency. Directmapped caches exhibit fast access time but poor hit rates 
compared with same sized set-associative caches due to nonuniform accesses to the 
cache sets, which generate more conflict misses in some sets while other sets are 
underutilized. We propose a technique to reduce the miss rate of direct mapped caches 
through balancing the accesses to cache sets. We increase the decoder length and th ... 
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The tremendous evolution of programmable graphics hardware has made high-quality 
real-time volume graphics a reality. In addition to the traditional application of rendering 
volume data in scientific visualization, the interest in applying these techniques for real- 
time rendering of atmospheric phenomena and participating media such as fire, smoke, 
and clouds is growing rapidly. This course covers both applications in scientific 
visualization, e.g., medical volume data, and real-time rendering, ... , 
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